Skip to content

fix(api): isolate side-effect session writes in multimodal and RAG handlers#38210

Open
agarwalpranav0711 wants to merge 6 commits into
langgenius:mainfrom
agarwalpranav0711:fix/isolate-side-effect-session-writes
Open

fix(api): isolate side-effect session writes in multimodal and RAG handlers#38210
agarwalpranav0711 wants to merge 6 commits into
langgenius:mainfrom
agarwalpranav0711:fix/isolate-side-effect-session-writes

Conversation

@agarwalpranav0711

Copy link
Copy Markdown

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

Fixes #38176

Extends the session isolation fix from #37895 to two files that were missed: base_app_runner.py and index_tool_callback_handler.py.

Both files used db.session.commit() on the shared Flask request-scoped session to persist side-effect writes (a MessageFile during multimodal LLM streaming, and DatasetQuery audit logs / hit_count updates during RAG retrieval). This flushed all pending ORM changes in the request transaction, not just the intended record, risking premature commits and DetachedInstanceError under concurrency.

Changes:

  • base_app_runner.py: _handle_multimodal_image_content() now persists the MessageFile through an independent sessionmaker session instead of the shared session.
  • index_tool_callback_handler.py: on_query() and on_tool_end() now persist audit/analytics writes through an independent session, following the same pattern as fix(api): isolate side-effect database writes #37895.
  • Added/updated unit tests in both corresponding test files to cover the new session isolation behavior.

Screenshots

N/A - backend transaction/session fix

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

…ndlers

Extends the session isolation fix from langgenius#37895 to two files that were
missed: base_app_runner.py and index_tool_callback_handler.py.

Both files used db.session.commit() on the shared Flask request-scoped
session to persist side-effect writes (a MessageFile during multimodal
LLM streaming, and DatasetQuery audit logs / hit_count updates during
RAG retrieval). This flushed all pending ORM changes in the request
transaction, not just the intended record, risking premature commits
and DetachedInstanceError under concurrency.

Both now use independent sessionmaker sessions matching the
established pattern from langgenius#37895.

Fixes langgenius#38176
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 30, 2026
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-07-03 07:09:49.851446131 +0000
+++ /tmp/pyrefly_pr.txt	2026-07-03 07:09:34.585416161 +0000
@@ -4261,17 +4261,17 @@
 ERROR Object of class `NoneType` has no attribute `status` [missing-attribute]
    --> tests/unit_tests/core/base/test_app_generator_tts_publisher.py:188:16
 ERROR Argument `None` is not assignable to parameter `app_id` with type `str` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.__init__` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:63:20
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:87:20
 ERROR Argument `None` is not assignable to parameter `message_id` with type `str` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.__init__` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:64:24
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:88:24
 ERROR Argument `None` is not assignable to parameter `user_id` with type `str` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.__init__` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:65:21
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:89:21
 ERROR Argument `None` is not assignable to parameter `invoke_from` with type `InvokeFrom` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.__init__` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:66:25
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:90:25
 ERROR Argument `None` is not assignable to parameter `query` with type `str` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.on_query` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:69:26
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:93:26
 ERROR Argument `None` is not assignable to parameter `dataset_id` with type `str` in function `core.callback_handler.index_tool_callback_handler.DatasetIndexToolCallbackHandler.on_query` [bad-argument-type]
-  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:69:32
+  --> tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py:93:32
 ERROR Argument `list[DummyToolInvokeMessage]` is not assignable to parameter `tool_outputs` with type `Iterable[ToolInvokeMessage]` in function `core.callback_handler.workflow_tool_callback_handler.DifyWorkflowCallbackHandler.on_tool_execution` [bad-argument-type]
   --> tests/unit_tests/core/callback_handler/test_workflow_tool_callback_handler.py:55:30
 ERROR Argument `list[DummyToolInvokeMessage]` is not assignable to parameter `tool_outputs` with type `Iterable[ToolInvokeMessage]` in function `core.callback_handler.workflow_tool_callback_handler.DifyWorkflowCallbackHandler.on_tool_execution` [bad-argument-type]

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Pyrefly Type Coverage

Metric Base PR Delta
Type coverage 52.27% 52.27% 0.00%
Strict coverage 51.78% 51.78% 0.00%
Typed symbols 31,977 31,977 0
Untyped symbols 29,478 29,478 0
Modules 2969 2969 0

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a transactional safety issue in the API by isolating “side-effect” ORM writes (audit logging, analytics counters, and message file persistence) from the shared Flask request-scoped db.session, preventing unintended commits/flushes of unrelated pending changes during multimodal streaming and RAG callbacks.

Changes:

  • Persist MessageFile in AppRunner._handle_multimodal_image_content() using an independent SQLAlchemy session (sessionmaker(...).begin()), instead of db.session.commit().
  • Persist DatasetQuery audit rows in DatasetIndexToolCallbackHandler.on_query() using an independent session, and update DocumentSegment.hit_count in on_tool_end() using an independent Session.
  • Update/add unit tests to assert the new session-isolation behavior (i.e., that db.session.commit() is not used for these side effects).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
api/core/app/apps/base_app_runner.py Writes MessageFile via an independent session to avoid committing/closing the request-scoped session during multimodal streaming.
api/core/callback_handler/index_tool_callback_handler.py Isolates audit log persistence and hit-count updates into independent sessions to avoid mutating caller transaction boundaries.
api/tests/unit_tests/core/app/apps/chat/test_base_app_runner_multimodal.py Updates multimodal tests to validate isolated session usage (no db.session.commit() for side effects).
api/tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py Updates callback handler tests to validate isolated session usage for audit/hit-count paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@FFXN

FFXN commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Hi, @agarwalpranav0711 please resolve the conflicts.

@agarwalpranav0711

Copy link
Copy Markdown
Author

On it, resolving now. Will push shortly.

…ct-session-writes

# Conflicts:
#	api/core/callback_handler/index_tool_callback_handler.py
#	api/tests/unit_tests/core/callback_handler/test_index_tool_callback_handler.py
@agarwalpranav0711

Copy link
Copy Markdown
Author

@FFXN Conflicts resolved and pushed. Also synced with the latest main (#38309) — the session-threading refactor there works cleanly alongside our independent-session fix with zero conflicts. All tests passing (1058/1058) across callback_handler, app/apps, rag/retrieval, and tools modules, plus lint and format checks clean. Ready for another look whenever you have time, thanks!

@agarwalpranav0711

Copy link
Copy Markdown
Author

@FFXN @laipz8200 @QuantumGhost Just checking in — this has been open
for a few days now with all CI checks passing and conflicts resolved.
Happy to make any changes needed. Let me know if there's anything
blocking review on your end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Shared db.session used for side-effect writes in multimodal streaming and RAG callback handlers

3 participants